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When using large samples of subjects in a test development program, one 
is faced with the problem of data analysis. Often it is not economically 
feasible to actually use all of the data which has been gathered. In view of 
this, an accurate method of estimating the value of the desired statistic 
for the whole sample from a smaller portion of the sample is desirable. 

In test analyses, the statistics usually used for revision purposes 
are the difficulty index and the discrimination index (Wood, 1961). The 
former is the proportion of subjects who answer the item correctly. Thus, 
a high difficulty index indicates less difficult items. The discrimination 
index is a measure of how well the test item discriminates subjects with 
respect to some criterion, i.e., how well the item correlates with some 
criterion measure. In essence, the discrimination index is a measure of the 
validity of the item; a measure of how well it predicts the criterion. 

PRCffiLEM 

In test construction, usually an item difficulty of 50% is aimed for. 

In programed learning it is possible to consider each frame (each question) 
as a test item, the difference being that the feedback (knowledge of z^sults) 
given to the student is relatively immediate, and that the purpose of the item 
J^not the same as a similar test item, in programed learning, the situation 
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Is constructed In order to foster learning; in a test, the situation Is 
constructed in order to determine how much learning has already occurred. 
Thus, the question of what the optimum level of difficulty is becomes less 
settled. Should most students be unable to answer the Item or should most 
students be allowed to succeed? item difficulty level is a variable in the 
field of programed learning; the optimum level of difficulty is yet to be 
empirically determined. Further, item difficulty is intimately related to 
the concept of step size. A possible empirical measure of the step size 
from item (a) to Item (b) is the difference in difficulty of the two items. 

Thus, the estimate of total sample Item difficulty should be scaled so that 
it has the property of additivity.^ 

Discrimination Index 

Kelley (1939) attacked the problem of finding the best sample for 

estimating the discrimination of a test item for the whole sample. He made 
the following assumptions: 

(1) that if 2j Individuals are to be selected as the sample on which the 

estimate Is to be based, the best results will be achieved if j 

individuals are chosen at the bottom of the distribution and j at the 
top; 

(2) that the whole sample is of size N, where N s 2m; 

(3) that the scores are graduated; 

(4) that the scores are normally distributed; 

see Jaoobs®a962)?“^ discussion of issues relating to both tests and problems 
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( 5 ) that the certainty with which these two groups are differentiated is 
given by 



( 1 ) 



f(j) = 



X - X, 

u 1 
S- 

X - X, 

u 1 



*9 where is the mean deviation score 



of the upper group and is the mean deviation score of the lower 
2 

group, 

A pictorial representation of the situation is given in Figure 1, The 
problem has now been reduced to the mathematical problem of solving equation 
(i) for j such that f is maximized. However, before differentiating, the 
scores need to be corrected for systematic error in order to work with the 
true scores.” This systematic error arises as a consequent of the par- 
ticular sampling method employed — a score that is not randomly selected, 
but rather chosen because it has a certain deviation will suffer from a 

systematic error. This error can be corrected for by regressing the score 
toward the mean; 

*a “ ^ reliability coefficient. 

The standard deviation of x'^ is S ^ r-r 2 where S is the standard deviation 

of the 2 m measures. All the other predicted scores will have the same 
Standard deviation. 



Thus X* =: S 
^ 1 



J ^ S 

2 • rx ts rx., and S-* s — — i 

1 J >T7 



S 

7 " . 



Thus, the critical ratio f(j) to be maximized becomes; 

V 2rx . V~j 



f(J) = 



x» - X* 
u 1 

S-» 

X - X* 

u 1 



^2rx 






, *• > 0. 



+1, 4. ^ Statistic commonly known as the "critical ratio,” It is clear 

“ Ld^x ®l®arly dlscrimlnable the distance between 

"u ^1 S®* larger, but 05 _ - will remain unchanged, 

u 
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Figo lo Hypothetical distribution of N scores* The scores increase from left to right 



r 



o 

me 



}) 



Since 



f (j) « K V j 



Xj, where K as 






2 r 



V r-r^ 



, r > O, 



then 



^(j) is n naximuni v/hen ^ j x. is a maximuin^ 

3 

mm 

liSt q = j/N, then x^ as q where z is the ordinate of the standard normal 
distribution (0, 1) at the point x where x is the end point of the tail which 
contains the proportion q of the cases of the distribtuion. Thus 

* (X) = K V qN ( I ) - K VT z/q = Cz 

V q^ 



and 



df 



dz ^ 1 2^ 
dx “ 2 3./. 



2 



Cz ^ z/ 

< -X + 2q ), 0 < q < ,50, 

V q - 

2/ y 

Thus, -X + q = o. That is q = 'im, Ifelley (1939) asserts that this 
value makes f a maximum. Be further asserts that q = “'sx when q = .2702678. 

Davis (1949) presents a table for quickly calculating discrimination 
indices based on the upper and lower 27%. He Improves Kblley's work by using 
Fisher's z as a direct measure of the discriminating power of an item. 

The advantage in using Fisher's z is that a given increase in the value of 
Fisher's z has essentially a constant meaning at any part in the range of its 
possible values. Davis also states that the correlation coefficients obtained 
by this procedure are not greatly affected by the difficulty levels of the 
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test items. Thus, if similar samples of students are used to determine the 

discrimination power of a set of test items, Fisher -s z transformation of 

these correlation coefficients can be legitimately added, subtracted, or 
averaged , 

Davis (1949) reports that the reliah-s i-i+tr -u ^ . 

ability of the discrimination indices 

calculated using his table (which is entered by means of the proportion of 

successes on the item in the upper and lower 27% of the sample) based on 100 
cases in each tall is approximately 0.60, 

Difficulty Index 

Davis (1949) also presents a method for determining the difficulty of 

an item (based on the proportion of success on the item of the 

on Tne item of the upper and lower 

27%) Which he states leads to a reliability of about .98 for a set of diffi- 
culty indices When the size of each tail is about 100 cases. „is development 

Is briefly outlined in the next few paragraphs with an adaptation to programed 
items. 

I^t p represent the proportion of students of the total sample that W 
the answer to an item, then P is defined as follows; 



R - 



W 



P = 



N - NR 






where R 
\r 
K 
N 
NR 



: number of students giving correct answers, 
number of students giving incorrect answers, 
number of choices of answers for the item, 
number of students in the sample, 

number of students who do not reach the item in the time limit. 
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In programed instruction, NR = O, since, with no time limit, all students will 
have an opportunity to read all questions. Therefore, all ommissions must 
be considered as errors, it is possible that a student will omit a pagcj he 
is not supposed to omit, but this must be counted as an error (not following 
instructions) . The expression reduces to 

„ K - W/(iC-l) 

^ ■ N " 

Furthermore, programed instructional items often will be of the fill-in-the- 
blank type. The size of the class of responses available to the student 
must be then determined from the context of the question. 

Let u be the set of all existing responses, S be the subset of u consisting 
of all responses potentially available to the student to use in answering 
the item, A be the subset of S consisting of the responses to the item, A be 
the subset of S consisting of the incorrect responses, and m (X) be the size 
of the set X or the number of elements or responses in the set X, then 
K = m (A ij A») = m (S). For example, suppose the item calls for a real number 
as the correct response. As the reals are a set of Infinite size, the number 

of elements in S becomes Infinite, and K = m (S) > oc^ . Thus, in the 

limit, as n increases to infinity, 

p , 5 - lim _ V(K-1) _ B . lim - 1) 



N 



N 



N 



N 



R lim ^ 

N tt > OO' 

R/ 

N (where n is the number of elements in S) 
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With a large sample, wo can estimate the proportion P from data on the 
highest and lowost 27% of the sample, lict P be the estimate for P 
such that 

^est. “ ■** where P^^ Is the proportion of successes In the upper 

2 

27%, and P^ In the lower 27% of the sample ^ 

The Davis table transforms the two proportions P and p Into a 

H Ii 

difficulty Index which is on a linear scale; that Is, they transform P 

est. 

Into a standard score and then multiply it by a constant (21,066) and add 
50 to this product. This transformation yields an essentially linear scale; 
a scale of proportions does not constitute such a linear scale. As Is true 
with most data transformations, only difficulty Indices based cm the same or 
similar samples are comparable. 

By means of the Davis Table (1949) a satisfactory estimate of Item 
difficulty and discrimination which does not require the laborious calcula- 
tion of the total population and which Is easily applied to a series of 
programed Items can be determined. 

SUMM/VRY 

This paper presents an analysis of a persistent problem In the 
development of programed Instructional materials: the reduction of data 

relating to student performance on program frames. It has been customary to 
use small samples of students and to look at all of their responses In order 
to make decisions about frame revision. This paper takes the other alternative 



k.er|c 
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as a given, namely, that a large sample of students is used, and asks what 
approaches to data reduction might be appropriate and useful. The use of a 
54% sample to obtain difficulty and discrimination indices is discussed in 
the light of the problems of programed instruction. 
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